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Introduction 


There are many studies examining factors affecting academic (math- 
ematics, science or reading) achievements using different statistical methods. 
Programme for International Student Assessment (PISA), Trends in Interna- 
tional Mathematics and Science Study (TIMSS) and Progress in International 
Reading Literacy Study (PIRLS) help countries to identify the areas that need 
to be improved and to increase their ranking among other countries. 

In the literature, researchers generally used PISA, TIMSS and PIRLS stud- 
ies’ dataset in order to determine the factors affecting students’ achievement 
(Carnoy, Khavenson, &lvanova, 2015; Sebastian, Moon, & Cunningham, 2016; 
Rutkowski, Rutkowski, Wild, & Burroughs, 2017). Von Davier, Hao, Liu and Kyl- 
lonen (2017) developed the collaborative problem-solving framework behind 
ETS Collaborative Science Assessment Prototype (ECSAP), which was based 
on PISA 2015 survey and Assessment and Teaching of the 21*t Century Skills 
(ATC21S) frameworks. Sheldrake, Mujtaba, and Reiss (2017) analyzed PISA 2006 
and PISA 2015 Science Tests scores for students in England. 

In addition to classical approaches, some data mining algorithms have 
been used in the educational area to assess or compare the performance 
of students in terms of science, mathematics or reading achievements. Ka- 
bakchieva’s (2013) study collected data from university management using 
methods such as J48, Naive Bayes, BayesNet, k-NN and JRip algorithms. J48 and 
JRip were found to be more reliable and demonstrated better performance 
than the other methods. Shariri, Husain, and Rashid studied the predicting of 
students’ performance in academic institutions in Malaysia, which was pro- 
posed to improve achievement using the Decision Tree, Neural Network, Naive 
Bayes, k-NN and SVM algorithms (Shariri, Husain, & Rashid, 2015). The result on 
prediction accuracy has been of the highest value in the Neural Network by 
(98%) followed by the Decision Tree, SVM, k-NN and Naive Bayes, respectively. 
The study of Martinez Abad and Chaparro Caso Lépez (2017) investigated 
the factors of students, social and schools by using classification techniques. 
As mentioned previously, the results showed that student-related variables 
were the most efficient factors for academic success (Cortez & Silva, 2008). 
Using data mining techniques, however, past evaluation had an influence 
on student performance, and the factors of parents’ education and job, and 
alcohol consumption were also other important variables regarding success. 
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Problem of Research 


The government of Turkey has a goal to reach a high level of education in their 10 development plan pub- 
lished in 2014. Thus, the government has been implemented many major improvements in the educational area 
to increase the quality of the education since 2010. However, Turkey does not take part in the highest performing 
countries in all PISA surveys. Turkey is ranked 54" out of 72 countries in terms of science achievement in 2015, 
which is very disappointing for Turkey. In order to reach a higher level of Turkish educational performance, factors 
that have a significant effect on the students’ achievement should be identified. Then, Turkey should focus on the 
areas of development by setting a good reference point. In terms of science achievement, the most successful 
country of PISA 2015 survey is Singapore, which is taken as a reference point for Turkey. 

Researchers studied on determining factors affecting students’ achievement using many different techniques 
in order to reach a high level of estimation accuracy. In the literature, there are many types of research about 
modeling students’ achievement, especially on classification and prediction based techniques. However, it gener- 
ally does not distinguish clearly which method has better performance in terms of estimation accuracy, because 
there are minor differences between the results of techniques used. Despite the fact that researchers may provide 
important contributions to educational studies using many statistical techniques, MARS and CART algorithms have 
rarely been used in this field. Using PISA 2015 survey, MARS and CART methods are analyzed and compared their 
performance in this research. 


Research Focus 


The research focused on determining factors that affect science achievement of Turkish students. It specifically 
tried to find answers to following questions: 
1. What are the significant factors that have a significant impact on students’ science achievement? 
2. Which method has the best prediction performance based on the goodness of fit criteria? 
When these questions are answered, following possible actions can be implemented for increasing students’ 
science achievement: 
1. The most significant factors of science achievement are determined. Thus, educational policymakers 
can focus on these factors and prioritize educational policies to increase science achievement. 
2. Themethod with the best performance in terms of classification and prediction between two algorithms 
used in the research is determined. Thus, this research can be a good reference for further researches 
in order to choose the best model for achievement prediction in the educational areas. 


Methodology of Research 
General Background 


PISA is an international survey which has been held by the Organization Economic for Co-Operation and 
Development (OECD) since 2000. It is conducted every three years in order to measure how well students can 
make a prediction using what they have learned and can interpret their knowledge about the subject that they 
are unfamiliar. Thus, it can help countries to implement the necessary educational policies. In all PISA surveys, the 
15-year-old students’ knowledge of mathematics, science and reading are questioned. Approximately 550,000 
students from 72 countries participated in PISA 2015 survey. Students took two-hour computer-based tests. Test 
items were multiple-choice or open-ended questions. Students also answered many questions about themselves, 
their homes, their schools and learning experience. Fieldwork of the survey was made during 2015 and two-step 
stratified sampling technique was used in the survey (MEB, 2016; OECD, 2018). 


Sample 
The top-performing country is Singapore in PISA 2015 survey in terms of science achievement. Thus, Turkey 
and Singapore dataset were taken into consideration in order to determine the main differences between these 


countries in terms of science achievement and other factors such as socio-economic cultural status and wealth etc. 
6,115 students from Singapore and 5,895 students from Turkey participated in the PISA 2015 study. Since the missing 
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values of the variables used in this research, 4,735 Singaporean and 4,569 Turkish students were used in this analysis. 
In PISA surveys, there are three basic question sets: student, school, and teacher questionnaire. In this research, 


only student-related factors affecting students’ achievement were examined. 


In the student questionnaire, there are both single (such as gender) and indexed (such as index of economic, 
social and cultural status and home possessions) variables. The indexed variables are designed as the Iltem-Response 


Theory (MEB, 2016). 


The student-related factors used in the analysis are given in Table 1. 


Table 1. 


Indexed Variable 
Name 


GENDER 
CULTPOSS 
HEDRES 
WEALTH 
ICTRES 
HOMEPOS 


ESCS 


BELONG 
UNFAIRTCHR 
MMINS 
SMINS 
COOPERATE 
CPSVALUE 
ENVAWARE 
ENVOPT 
JOYSCIE 
INTBRSCI 
DISCLISCI 


IBTEACH 


TEACHSUP 
TDTEACH 
PERFEED 
ADINST 
INSTSCIE 
ANXTEST 
MOTIVAT 
EMOSUPS 
SCIEEFF 
EPIST 
SCIEACT 
AUTICT 
COMPICT 
ENTUSE 
HOMESCH 
ICTHOME 
ICTSCH 
INTICT 
SOIAICT 
USESCH 


Variables examined in the analysis. 


Description 


GENDER 

CULTURAL POSSESSIONS AT HOME 
HOME EDUCATIONAL RESOURCES 
FAMILY WEALTH 

ICT RESOURCES 

HOME POSSESSIONS 


INDEX OF ECONOMIC, SOCIAL AND CULTURAL STATUS 


SENSE OF BELONGING TO SCHOOL 

TEACHER FAIRNESS 

MATHEMATICS LEARNING TIME (MINUTES PER WEEK) 
SCIENCE LEARNING TIME (MINUTES PER WEEK) 
ENJOY COOPERATION 

VALUE COOPERATION 

ENVIRONMENTAL AWARENESS 
ENVIRONMENTAL OPTIMISM 

ENJOYMENT OF SCIENCE 

INTEREST IN BROAD SCIENCE TOPICS 
DISCIPLINARY CLIMATE IN SCIENCE CLASSES 


INQUIRY-BASED SCIENCE TEACHING AND LEARNING 
PRACTICES 


TEACHER SUPPORT IN A SCIENCE CLASSES 
TEACHER-DIRECTED SCIENCE INSTRUCTION 
PERCEIVED FEEDBACK 

ADAPTION OF INSTRUCTION 

INSTRUMENTAL MOTIVATION 

TEST ANXIETY 

ACHIEVEMENT MOTIVATION 

PARENTS EMOTIONAL SUPPORT 

SCIENCE SELF-EFFICACY 

EPISTEMOLOGICAL BELIEFS 

SCIENCE ACTIVITIES 

STUDENTS’ PERCEIVED AUTONOMY RELATED TO ICT USE 
STUDENTS’ PERCEIVED ICT COMPETENCE 

ICT USE OUTSIDE OF SCHOOL LEISURE 

ICT USE OUTSIDE OF SCHOOL FOR SCHOOLWORK 
ICT AVAILABLE AT HOME INDEX 

ICT AVAILABLE AT SCHOOL INDEX 

STUDENTS’ ICT INTEREST 

STUDENTS’ ICT AS A TOPIC IN SOCIAL INTERACTION 
USE OF ICT AT SCHOOL IN GENERAL 


ST04 
$1011, ST012 
ST011 
ST011, ST012 
ST011, ST012 


ST034 
ST039 
ST059, ST061 
ST059, ST061 
ST082 
ST082 
ST092 
ST093 
ST094 
ST095 
ST097 


ST098 


ST100 
$1103 
$1104 
ST107 
$1113 
$1118 
$1119 
$1123 
$1129 
$T131 
$1146 
IC015 
IC014 
C008 
IC010 
C001 

IC009 
IC013 
IC016 
C011 


Question no. in PISA Questionnaire 


$1011, ST012, ST013 


ST005, ST006, STO07, ST008, ST011, ST012, 
ST013, ST014, ST015 
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In Table 1, gender was defined as a categorical variable (0-female, and 1-male); the other variables were 
defined as index variables. 


Instrument and Procedures 


CART is based on a recursive partitioning method, which is used for predicting categorical independent 
variables (classification) and continuous dependent variable (regression). CART, which is a common decision tree 
algorithm, was firstly introduced in 1984 (Breiman, Friedman, Olshen, & Stone, 1984). CART is sensitive to outliers 
and missing data. 

Using all the independent variables, CART is constructed by splitting subsets to compose of two child knots 
repeatedly. The impurity or diversity measures such as Gini, least-squared deviation, towing and ordered twoing 
are used for choosing the best predictor. According to the goal of the study, the desire is to obtain subgroups as 
homogeneous as possible (Breiman, Friedman, Olshen, & Stone, 1984; Ture, Tokath, & Kurt, 2009). 

MARS was first introduced by Jerome H. Friedman (1991) and can be defined as follows: 


f (x) = By + n=1 Bm Wy 


(1) 

M, K_ and Bs are the number of basis functions, the number of knots and the parameters, respectively in 
Equation 1.s,_ takes on the value of either 1 or -1. v(k,m) and t,_ indicate the label of the independent variable 
and the knot location, respectively. 

The general MARS method is constructed in a two-step process. Firstly, all the possible basis functions produced 
using independent variables are added and found knots to improve predicting in the forward stepwise process. This 
continues until the basis functions reach a predetermined maximum number. In a backward stepwise process, the 
best model is also reached by eliminating some basis functions from the most complex model to prevent overfitting. 
Generalized Cross-Validation (GCV) is used to measure the quality goodness of fit that penalizes large numbers of 
basis functions and seems to reduce the probability of overfitting. When the variable is excluded from the model, 
the GCV value is re-calculated and compared to the previous GCV value in order to measure the variable importance. 
These values are on a scale of 0-100. If the GCV value has the highest decrease, it will score 100, which is the most 
important variable. MARS has been commonly used by researchers for the following advantages: (1) MARS is flex- 
ible in specifying the nonlinear relationships between a dependent variable and independent variable(s) without 
the model assumptions of the regression methods. (2) MARS gives us different functions for distinct intervals of 
independent variables. MARS can not only analyze the effect of independent variables on the dependent variable, 
but it can also analyze all degrees of the interactions of the independent variables with each other. Moreover, it can 
show the effect of these interactions on the dependent variable. (3) MARS is a stepwise regression model which 
can be more easily understood and interpreted than other classification techniques. (4) There is no restriction on 
the variable type. It may be used categorically or continuously (Garcia Nieto et al., 2017; Lee & Chen, 2005). 


Data Analysis 


R Studio tool was used for data analysis. There are many packages for machine learning algorithms in R, but 
the most commonly used packages named as Earth and Caret (Earth package for MARS and Caret package for 
CART) were used in this research. 

First of all, descriptive statistics were given. Secondly, in order to avoid overfitting/underfitting problem, k-fold 
cross-validation with n-repeat process was run for each algorithm (Khun & Johnson, 2013). In this research, 10-fold 
with 10-repeat process was used. This process had the following steps: 

1. Data divided into 10 equal parts randomly. 

2. 9of 10 parts were used as training samples and the last part was used as a test sample. 

3. The process from step 1 to 2 was repeated 10 times and each time the algorithm chose a different 
portion as the testing data. 

After 10-fold cross-validation with 10-repeat process, the most important factors that had a significant effect 
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on science achievement were determined. Furthermore, these factors were ranked by their importance on science 
achievement. Finally, the results of MARS and CART were compared in terms of model accuracy statistics, which are 
R?, Mean Absolute Error (MAE), Mean Square Error (MSE) and Root Mean Square Error (RMSE) to find the best model. 


Results of Research 
The indexed variables described in Table 1 were used in CART and MARS algorithms. Thus, in general, the 
higher values in the variables have a positive meaning for the related topic. The descriptive statistics of the students- 


related variables for Turkey and Singapore are given in Table 2. 


Table2. | The comparison of descriptive statistics between Turkey and Singapore. 


Turkey Singapore 

Variables X SD Min Max X SD Min Max 
pv1scie 433 76 218 708 562 98 228 888 
adinst 0.10 0.95 -1.97 2.05 0.41 0.89 -1.97 2.05 
anxtest 0.33 1.03 -2.51 2.55 0.57 0.95 -2.51 2.55 
belong -0.40 1.12 -3.13 2.60 -0.22 0.88 -3.13 2.61 
cooperate 0.02 1.10 -3.33 2.29 0.33 1.01 -3.33 2.29 
cpsvalue -0.03 0.92 -2.83 2.10 0.26 1.03 -2.83 2.10 
cultposs -0.23 0.87 -1.71 2.46 -0.19 0.98 -1.63 2.56 
disclisci -0.12 0.94 -2.42 1.88 0.19 0.89 -2.42 1.88 
emosups -0.23 1.06 -3.08 1.10 -0.24 0.97 -3.08 1.10 
envaware 0.57 1.43 -3.38 3.29 0.43 1.10 -3.38 3.29 
envopt -0.60 1.42 -1.79 3.01 -0.07 1.14 -1.79 3.01 
epist -0.18 1.15 -2.19 2.16 0.24 0.89 -2.19 2.16 
eSCS -1.40 1.15 -4.65 2.20 0.02 0.90 -4.05 3.50 
hedres -0.54 1.11 -4.37 1.18 0.17 1.01 -4,.37 1.18 
homepos -1.38 1.09 -6.71 3.05 -0.11 0.89 -5.43 5.12 
ibteach 0.31 1.14 -3.34 3.18 0.00 0.84 -3.34 ole 
ictres -1.15 0.94 -3.27 3.50 0.20 0.93 -3.27 3.50 
instscie 0.39 0.89 -1.93 1.74 0.53 0.80 -1.93 1.74 
intbrsci -0.03 1.01 -2.58 2.13 0.31 0.88 -2.55 2.60 
joyscie 0.13 1.14 -2.12 2.16 0.62 0.97 -2.12 2.16 
mmins 228 76 0 640 309 142 0 1800 
motivat 0.64 0.99 -3.09 1.85 0.43 0.94 -3.09 1.85 
perfeed 0.33 0.96 -1,53 2.50 0.32 0.91 -1.53 2.50 
scieact 0.68 (Ee) -1.76 3.36 0.20 1.08 -1.76 3.36 
scieeff 0.36 127 -3.76 3.28 0.12 1.09 -3.76 3.28 
smins 209 105 0 800 333 167 0 1920 
tdteach -0.06 0.96 -2.45 2.08 0.27 0.93 -2.45 2.08 
teachsup 0.19 0.98 -2.12 1.45 0.30 0.87 -2.12 1.45 
unfairtchr 10.2 3.9 4.0 24.0 9.9 of 5.0 24.0 
wealth -1.45 0.98 -4,93 4.09 -0.18 0.83 477 4.08 
autict n/a n/a n/a n/a 0.20 0.96 -2.50 2.10 
compict n/a n/a n/a n/a -0.01 0.89 -2.66 1.97 
entuse n/a n/a n/a n/a -0.10 0.81 -3.71 4.85 
homesch n/a n/a n/a n/a 0.02 0.89 -2.69 3.60 
icthome n/a n/a n/a n/a 7.94 1.93 0.00 11.00 
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Turkey Singapore 

Variables Xx SD Min Max xX SD Min Max 

ictsch n/a n/a n/a n/a 6.70 2.13 0.00 10.00 

intict nla nla nla n/a 0.28 0.92 -2.96 2.64 

soiaict n/a n/a n/a nla 0.15 0.91 -2.14 2.43 

usesch n/a n/a nla nla 0.00 0.92 -1.67 3.63 


CART algorithm’s tree diagram of the Turkey dataset is given in Figure 1. 
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Figure 1. CART diagram for Turkey dataset (based on min relative error). 
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According to CART diagram, environmental optimism was the most important variable in science achieve- 
ment. The variables from the most important to the least important for achievement were envopt, smins, homepos, 
wealth, ictres, envaware, escs, epist, ibteach and anxtest, respectively. 

Cut off points of the variables and interaction effects in CART model are shown in Figure 2. 
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Figure 2. The cutoff points of the variables in the CART model for Turkey dataset. 


As a result of CART model for Turkey dataset, R?, MAE, MSE and RMSE statistics were 0.332, 49.773, 3860.814 
and 62.135, respectively. 

As seen in Figure 3, MARS algorithm with 2-way interactions was used for Turkey dataset and cut points of 
variables which had a statistically significant effect on achievement. 
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37 basis functions were determined for MARS analysis and coefficients of significant variables and interaction 
effects are given in Table 3. 


Table 3. 


MARS output for Turkey dataset. 


Basis Functions 


(Intercept) 

gender 

h(adinst- -0.3816) 
h(-1.7251-anxtest) 
h(anxtest- -1.7251) 
h(0.2085-cooperate) 
h(0.1184-envaware) 
h(envaware-0.1184) 
h(2.2486-envopt) 
h(epist- -1.5276) 
h(epist-1.5636) 
h(escs- -0.6417) 


Coefficient 


467.471 
-9.336 
7.304 

-21.941 
-6.585 
-6.825 
-9.268 
2.907 
19.060 
16.462 
-24 364 
53.250 


Basis Functions 


h(-0.7275-wealth) 

h(wealth- -0.7275) 

gender * h(unfairteacher-7) 

gender * h(wealth-0.0849) 

h(anxtest- -1.7251) * h(mmins-240) 
h(-1.0005-disclisci) * h(epist- -1.5276) 
h(-1.3298-emosups) * h(epist- -1.5276) 
h(emosups- -1.3298) * h(epist- -1.5276) 
h(envaware-0.1184) * h(joyscie-0.5094) 
h(-1.7476-envopt) * h(escs- -0.6417) 
h(envopt- -1.7476) * h(escs- -0.6417) 
h(2.2486-envopt) * h(ibteach- -0.8489) 


Coefficient 


8.066 
-23.194 
-1.420 
38.563 
-0.050 
-6.952 
-9.265 
-3.066 
5.032 
-723.513 
-7.913 
-2.817 
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h(-0.2221-homepos) -14.896 h(2.2486-envopt) * h(-0.8489-ibteach) -2.387 
h(-0.4459-ictres) -11.031 h(2.2486-envopt) * h(scieact- -0.0265) -3.170 
h(-0.1713-perfeed) 17.944 h(2.2486-envopt) * h(-0.0265-scieact) -3.560 
h(1.8814-scieeff) -11.037 h(-0.6417-escs) * h(90-smins) 0.182 
h(scieeff-1.8814) -7.896 h(1.3364-scieact) * h(1.8814-scieeff) 2.730 
h(400-smins) -0.177 h(scieact-1.3364) * h(1.8814-scieeff) 5.039 
h(teachsup-0.9209) -15.016 


Similar to CART algorithm, the variables from the most important to the least important on achievement in 
MARS analysis were envopt, homepos, smins, epist, envaware, ibteach, anxtest, escs, joyscie, scieact, gender, un- 
fairteacher, scieeff, mmins, perfeed, wealth, cooperate, emosups, adinst, teachsup, disclisci and ictres. As a result 
of MARS analysis for Turkey dataset, R’, MAE, MSE and RMSE statistics were 0.417, 46.349, 3365.63 and 58.015, 


respectively. 
In this research, CART and MARS algorithms were used for Singapore dataset to compare the results of two 


algorithms. CART Tree diagram of Singapore dataset is given in Figure 4. 
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Figure 4. CART diagram for Singapore dataset (based on min relative error). 
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Based on Figure 4, learning time (minutes per week) of science subject was the most important variable in sci- 
ence achievement. Variables affecting on achievement (from the most important to the least important) were smins, 
mmins, envaware, escs, scieeff, epist, homepos, disclisci, envopt, unfairteacher, tdteach and ictsch, respectively. 

When cut points of the variables were examined, students whose smins was higher than 290 minutes, the 
mmins was lower than 412 minutes, the scieeff score higher than 0.46 and the epist score higher than 0.55 had the 
highest science score with 661. On the other hand, students whose smins was lower than 290 minutes, the enware 
score was lower than -0.63 and the envopt score was higher than 0.71 had the lowest science score with 419. 

As seen in Figure 5, there are cut off points of the variables in CART algorithm. 
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te00 Left Se-0T | o 1 z 
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Figure 5. The cutoff points of the variables in the CART model for Singapore dataset. 


As aresult of CART model for Singapore dataset, R?, MAE, MSE and RMSE statistics were 0.390, 60.621, 5872.865 
and 76.634, respectively. 

MARS algorithm with 2-way interactions was used for Singapore dataset and cut off points of variables are 
shown in Figure 6. 
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Figure6. The cutoff points of the variables in the MARS model for the Singapore dataset. 


48 functions were determined for MARS analysis and coefficients of significant variables and interaction ef- 
fects are given in Table 4. 
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Table 4. 


Basis Functions 


(Intercept) 

gender 

h(anxtest- -1.9346) 
h(0.3821-cpsvalue) 
h(cpsvalue-0.3821) 
h(cultposs- -1.1752) 
h(0.8351-disclisci) 
h(0.3361-envaware) 
h(envopt- -0.3192) 
h(0.84-epist) 
h(-1.6073-escs) 
h(escs- -1.6073) 
h(0.578-homepos) 
h(-0.388-homesch) 
h(homesch- -0.388) 
h(0.1368-ibteach) 
h(ibteach-0.1368) 
h(8-icthome) 
h(icthome-8) 
h(6-ictsch) 
h(ictsch-6) 
h(-0.3944-intict) 
h(joyscie- -1.0286) 
h(300-mmins) 


MARS output for Singapore dataset. 


Coefficients 


664.857 
-30.874 
-6.490 
11.953 
10.685 
-5.213 
-13.327 
-18.450 
-12.635 
-15.757 
40.991 
17.183 
-22.209 
“11.277 
-20.172 
-11.813 
-10.854 
-3.048 
-7.622 
-4.637 
-8.416 
-21.561 
ola 
-0.206 
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Basis Functions 


h(mmins-300) 

h(0.6178-perfeed) 

h(perfeed-0.6178) 

h(0.4035-scieact) 

h(scieact-0.4035) 

h(-0.999-soiaict) 

h(soiaict- -0.999) 

h(1.4501-tdteach) 

h(unfairteacher-9) 

gender * h(belong- -0.5178) 

gender * h(mmins-540) 

gender * h(540-mmins) 

gender * h(360-smins) 

h(adinst- -0.3816) * h(cultposs- -1.1752) 
h(adinst- -0.3816) * h(8-icthome) 
h(1.515-autict) * h(soiaict- -0.999) 
h(0.0226-compict) * h(homesch- -0.388) 
h(compict-0.0226) * h(homesch- -0.388) 
h(-1.039-cpsvalue) * h(joyscie- -1.0286) 
h(cpsvalue- -1.039) * h(joyscie- -1.0286) 
h(-1.1752-cultposs) * h(unfairteacher-12) 
h(-1.1752-cultposs) * h(12-unfairteacher) 
h(0.3708-instscie) * h(1.4501-tdteach) 
h(joyscie- -1.0286) * h(1.4886-scieeff) 


( 
( 
( 
( 
( 
( 
( 
( 


Coefficients 


-0.069 
11.132 
-17.399 
-5.511 
-13.441 
-16.011 
-7.363 
-8.063 
-3.509 
-14.235 
0.097 
0.117 
-0.234 
3.637 
3.875 
-6.368 
16.333 
9.222 
-11.710 
-5.259 
11.652 
5.907 
6.997 
-6.261 


In order to evaluate the performance of the machine learning algorithm in terms of prediction accuracy, the 


results of the algorithms are summarized in Table 5. 


Table 5. 


Goodness of Fit Criteria 


R2 
MAE 
MSE 

RMSE 


CART 


0.332 
47.776 
3860.81 

62.14 


The goodness of fit statistics comparison. 


Turkey 


Singapore 


MARS CART 
0.417 0.390 
46.349 60.621 
3365.63 0872.87 
58.02 76.63 


MARS 


0.552 
92.115 
4310.93 
65.66 


The analytic results demonstrated that the MARS algorithm had lower MAE, MSE and RMSE and higher R? 
values than CART algorithm in Turkey and Singapore. The variable importance comparison of the two algorithms 
is summarized in Table 6. 
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Table 6. Variable Importance. 


Turkey Singapore 
Importance CART MARS CART MARS 
1 envopt envopt smins @SCS 
Z smins homepos mmins envaware 
3 homepos smins envaware joyscie 
4 wealth epist @SCS ictsch 
5 ictres envaware scieeff autict 
6 envaware ibteach epist soiaict 
7 @SCS anxtest homepos unfairteacher 
8 epist @SCS disclisci gender 
9 ipteach joyscie envopt smins 
10 anxtest scieact unfairteacher envopt 
11 gender tdteach disclisci 
12 unfairteacher ictsch cpsvalue 
13 scieeff epist 
14 mmins perfeed 
15 perfeed homepos 
16 wealth icthome 
17 cooperate scieeff 
18 emosups mmins 
19 adinst homesch 
20 teachsup belong 
21 disclisci tdteach 
22 ictres anxtest 
23 intict 
24 instscie 
25 compict 
26 ibteach 
27 scieact 
28 cultposs 
29 adinst 


1: the most important; 29: the less important 


In Table 6, it was revealed that the top three most important variables in science achievement were not 
differentiated in CART and MARS algorithms in the Turkey dataset. However, their ranking was different. Fur- 
thermore, according to the results of CART and MARS algorithms, more variables were used in MARS algorithm 
to explain science achievement. 

Similar to the variable importance results of Turkey dataset, more variables were used with MARS algorithm 
in the Singapore dataset. However, the top three most important variables in science achievement were dif- 
ferentiated in CART and MARS algorithms in Singapore dataset and only the envaware was common. 

As a result, MARS algorithm used more variables and produced much more sensitive results than CART 
algorithm in this dataset. In addition to this, MARS produced higher R* and lower MAE, MSE and RMSE values than 
CART algorithm. Thus, it could be said that MARS algorithm outperformed CART algorithm in this research. 


Discussion 


During the past decades, students’ factors affecting academic achievement have become very common 
and important for the educational system. Research institutions and government agencies are evaluating this 
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topic to develop actionable decisions on students’ achievement. In this research, when the variable importance 
was examined, it was revealed that environmental optimism, home possessions and science learning time (min- 
utes per week) were the most important factors that needed to be improved for Turkish students to increase 
students’ science achievement. This result was parallel to the results obtained in the researches in the literature 
(Hoy, Tarter, & Hoy, 2006; Littledyke, 2008; Yang, 2010; Singh, Granville, & Dika, 2010). 

According to the average score of environmental optimism and basis function result of MARS, it is clear 
that Turkish students do not have enough knowledge about its sub-criteria: air pollution, extinction of plants 
and animals, clearing of forests for other land use, water shortages, nuclear waste, the increase of greenhouse 
gases in the atmosphere and the use of genetically modified organisms. Moreover, their future expectation of 
these criteria is not optimistic. Also, this variable should be higher than 2.2486 to increase science achievement 
for Turkish students. The impact of environmental optimism on Singaporean students’ science achievement 
was not as important as the impact on Turkish students’ science achievement. Although Turkey’s government 
began steering up its environmental awareness in the early 1980s, a major policy was included in the country’s 
10° development plan in 2014. These policies are increasing the development of Turkey’s renewable energy, 
prevention and adaptation to climate change, conservation of biodiversity, soil erosion control, reforestation 
and fighting desertification and reforestation (Smith, 2015). However, it is clearly concluded that the impor- 
tance of these programs is not realized by the government. Also, Turkey is 37 rank out of 38 countries in the 
environment category of OECD's better life index. This result corresponds to the findings of the OECD's report 
(OECD, 2017). When Singapore environmental policy is examined, it is seen that its environmental awareness 
started in the early 1970s. In the research of Hays (2008), it was emphasized that the Singapore government 
has been aware of the requirements for environmental protection since the 1970s. For this reason, Minister’s 
Offices were established and these offices carried out many programs about cleaning up rivers and streams, 
moving animals to resettlement areas and controlling discharges from small industries to handle environmental 
issues. Because of the early implementation of environmental action plans in Singapore, it can be inferred that 
the average environmental optimization score of Singapore is higher than Turkey. Thus, we are expecting that 
environmental awareness and optimism are going to increase in the near future after the action plans that are 
taken by Turkish government. 

In this research, another important factor that affects Turkish students’ science achievement was home 
possessions. Home possessions variable consists of 20 different items: a desk to study at, a room of their own, a 
quiet place to study, a computer they can use for school work, educational software, a link to the Internet, clas- 
sic literature, books of poetry, books to help with your school work, a dictionary, books on art, music or design, 
televisions, cars, rooms with a bath or shower, cell phones with internet access, computers, tablet computers, 
e-book readers, musical instruments and the number of books. Average home possessions score was -1.38 for 
Turkey and -0.11 for Singapore. Based on MARS results, this variable should be higher than -0.2221 for better 
science achievement score. Similar to the researches in the literature, home possessions, which were highly 
correlated to socio-economic status (Turmo, 2004; Marks, Cresswell, & Ainley, 2006), had a significant effect on 
students’ science achievement in this research. Gallup conducted a survey from November 11‘ to December 
25" in 2013 in order to understand media usage behavior in Turkey (Gallup, 2013). 2,020 people were attended 
in the survey and 99% of them had a television, 72% of them had a working computer and 68% of them had 
an internet connection and 78% of the population had a cell phone. The importance of home possessions on 
science achievement for Singaporean students was relatively lower than Turkish students. On the other hand, 
98% of people had a television, 83% of people had a working computer, 78% of them had an internet access and 
97% of people had a cell phone (Department of Statistics Singapore, 2018). In addition to this, gross domestic 
product per capita was 55,235$ in Singapore while 14,933S in Turkey as of December 2017 (Trading Economics, 
2018). It is obvious that socioeconomic cultural status of Singapore is better than Turkey. Thus, Turkish govern- 
ment should create action plans to increase health and decency standard of living. 

The results of this research corresponded to the findings of other studies in the literature (Chandler & 
Swartzentruber, 2011; Sha, Schunn, & Bathgate, 2015). Science learning time had a significant effect on Turkish 
students’ science achievements. Science learning time was 209 minutes on average for Turkey and 333 minutes 
for Singapore in this research. According to the MARS results, an average of science learning time should be 
over 400 minutes per week for being more effective. This means that the current curriculum should be updated 
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immediately. Science learning time should be increased to (at least) 333 minutes per week and then it should 
be increased to 400 minutes in order to reach a high level of science success. 

After evaluating the important factors affecting on Turkish students’ science achievement, it could be found 
that family wealth, adaption of instruction, environmental awareness, epistemological beliefs and internet and 
computer technology (ICT) resources were important factors that had a significant impact on students’ science 
achievement (Geske & Kangro, 2002; Ozdem, Cavas, Cavas, Cakiroglu, & Ertepinar, 2010; Zahorec, Haskova, Munk, 
& Bilek, 2013; Henno & Reiska, 2013; Rannikmde, 2016). In this research, these variables should be higher than 
0.7275, 0.3816, 0.1184, 1.5276 and 0.4459, respectively in order to reach a high level of science achievement. 
When the results of MARS for Singapore examined, it was revealed that index of economic, social and cultural 
status, environmental awareness, enjoyment of science, ICT available at school index and students’ perceived 
autonomy related to ICT use variables should be higher than 1.6073, 0.3361, 1.0286, 6.000 and 1.515, respectively 
in order to reach high level of science achievement. 

According to the research findings, MARS was outperformed CART in terms of predicting students’ science 
achievement. The performance analysis indicated that MARS had the highest coefficient of determination with 
the value of 42% in Turkey and 55% in Singapore. 


Conclusions 


In overview, science achievement is critical for developing science literacy which is explored by PISA surveys. 
It is assessed the students’ science knowledge as well as what they can do and how they can apply scientific 
knowledge in real life. According to the results of PISA 2015, it has been concluded that science achievement of 
Turkish students is lower than the OECD average. Therefore, it is necessary to explore how to develop scientific 
literacy. 

It was determined that environmental optimism had a positive significant effect on students’ science 
achievement. The government should be able to support people with high levels of knowledge about envi- 
ronmental awareness to provide training or workshop. In addition, these people can be invited as a speaker 
in high schools’ or companies’ training programs in order to increase environmental awareness. Providing in- 
service training to teachers and educators or preparing a public service announcement could be another way 
to promote environmental knowledge. 

Home possessions were the second important factor that had a significantly positive effect on students’ 
achievement. When the sub-criteria of home possessions are considered, it can be said that it is really hard to 
increase students’ home possessions score in a short time. However, parents should provide their children with 
a desk to study at and a quiet place to study. Also, it is critical for developing students’ science achievement that 
students have a computer with an internet access in both their home and school. Since providing a computer 
and an internet access are a budgeting issue for parents, the government should support schools to set up at 
least one computer laboratory with an internet access. 

It was emphasized that science learning time was one of the other factors affecting students’ science 
achievement. Science learning time per week was 209 minutes in Turkey, which was below PISA overall average. 
In the light of this research, it is recommended that science learning time per week should be increased and the 
curriculum should be enriched. For that reason, educational policymakers and the government should work on 
how to make an effective educational plan. 

Although this research should give a reference in this field for future research, there are some limitations 
in this research. Firstly, in order to explain the students’ science achievement, it can be included in the analysis 
not only student variables but also teacher-related and school-related variables which are obtained by PISA 
survey. Secondly, Singapore is the most successful country of PISA 2015 in terms of science achievement, so 
PISA data is analyzed to make a comparison between Turkey and Singapore in this research. Other countries 
participating in PISA survey may also be analyzed to compare students’ achievement. Thirdly, there are so 
many machine learning techniques for solving different problems. It is used MARS and CART algorithms in this 
research. In order to measure prediction and classification performance, different machine learning techniques 
can be tried in future research. 
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